Popular Encoding Formats: JSON

CSV is one of the more basic human-readable encodings that DevOps engineers will encounter, but it is by no means the only one. Within the last two decades, several new formats have emerged that are used to transfer information or provide configuration to applications.

JavaScript Object Notation (JSON) is a data serialization format that was designed to convert JavaScript objects into a textual representation so that they could be saved or transferred. This notation, due to its simplicity and clarity, has been adopted by almost every language to transfer data.

svg viewer

Yet another markup language (YAML) is another data serialization format that is often used to store configuration information for a service. YAML is the primary configuration language in Kubernetes clusters.

In this lesson, we'll look at the ways to marshal and unmarshal data from Go types into these formats and back into the Go type.

The Go field tags#

Go has a feature called field tags that allow a developer to add string tags to struct fields. This allows a Go program to inspect the extra metadata regarding a field before performing an operation. Tags are key/value pairs:

Adding a tag to the Last field

In the preceding code snippet, we can see a struct type with a field called Last that has a field tag. The field tag is an inline raw string. Raw strings are denoted by backticks. This will produce a tag with a key of "json" and a value of "last_name".

Go packages can use the reflect package to read these tags. These tags allow a package to change the behavior of an operation based on the tag data. In this example, it tells our JSON encoder package to use last_name instead of Last when writing data to JSON and the reverse when reading data. This feature is key for packages that handle data marshaling.

JSON#

Over the past decade, the JSON format has become the de facto format for data encoding to disk and for communicating via RPC to services. No language in the cloud space can be successful without supporting JSON.

A developer might encounter JSON as an application configuration language, but it is poorly suited for this task due to the following reasons:

  • The lack of multiline strings

  • The inability to have comments

  • The pickiness regarding its punctuation (that is, good for machines, and bad for humans)

For the interchange of data, JSON can be quite useful with only a few downsides, such as the following:

  • Schemaless

  • Non-binary format

  • Lack of byte array support

A schema is a definition of a message’s content that lives outside code. Schemaless means there is no strict definition of what a message contains. This means that, for every language that is supported, we must create definitions for our messages in that language. Formats such as protocol buffers have entered into this space to provide a schema that can be used to generate code for any language.

JSON is also a human-readable format. These types of formats are not as efficient as binary formats in terms of size and speed. This generally matters when trying to scale large services. However, many prefer human-readable formats due to their ability to be easily debugged.

svg viewer

JSON’s lack of support for byte arrays is also a failure. JSON can still transfer raw bytes, but it requires encoding and decoding the bytes using base64 encoding and storing them in JSON's string type. This requires an extra level of encoding that should be unnecessary. There are several supersets of JSON that is not widely supported (such as Binary JSON, or BSON for short) that contain a byte array type.

JSON is delivered to a user in one of several ways:

  • As a single message that can contain sub-messages.

  • As an array of JSON messages.

  • As a stream of JSON messages.

JSON's origins started as a format for simply encoding a JavaScript object for transfer. However, as its uses have grown, the need for sending large messages or streams of messages became a use case. Single, large messages can be hard to decode. Generally, JSON decoders are written to read the entire message into memory and validate the message's content.

To simplify large sets of messages or streaming content, we might encounter a message with brackets, [], surrounding a set of messages or individual messages separated with carriage returns. These are not valid JSON as inteded, but have become de facto standards for handling large sets of data as small, individual messages that make up part of a whole stream.

Because JSON is a standard part of the cloud ecosystem, Go has built-in language support in the standard library's encoding/json package. In the upcoming sections, we'll detail the most common ways to use the JSON package.

Marshaling and unmarshaling to maps#

Because JSON is schemaless, it is possible to have messages of different types in a stream or files. This is usually undesirable, and it is better to have a top-level message that holds these types of messages.

When we need to handle multiple message types or do discovery on a message, Go allows us to decode messages into map[string]interface{}, where the string key represents the field name and interface{} represents the value. Let's examine an example of unmarshaling a file into a map:

/
data.json
main.go
JSON unmarshalling to map

The preceding example does the following:

  • Line 10: It reads the content of the data.json file into variable b.

  • Line 14: It creates a map, called data, to store our JSON content.

  • Line 16: It unmarshals the raw bytes representing the JSON into data.

  • Line 18: It looks up the name key in data.

  • Line 20: If name does not exist, we return an error.

  • Lines 23-26: If the value is a string, we return the content.

Using the map, we can explore the values in the data to discover a message type, type assert the interface{} value to a concrete type, and then use the concrete value. Remember that type assertion converts an interface variable into another interface variable or a concrete type such as string or int64.

Using a map is the hardest method of data decoding for JSON. It is only recommended in cases where the JSON is unpredictable, and there is no control of the data provider. It is usually better to have whatever is providing the data change its behavior than decoding in this way.

Marshalling a map into JSON is simple:

Marshalling to a map

The json.Marshal will read our map and output valid JSON for the contents. []byte fields are automatically base64 encoded into JSON's string type.

Marshaling and unmarshaling to structs#

The preferred method of JSON decoding is doing so in a Go struct type that represents the data. Here is an example of how to create a user record struct, which we'll use to decode a JSON stream:

JSON marshalling to a struct

This code does the following:

  • Line 1: It defines a Record type.

  • Lines 2–3: It uses field tags to tell JSON what the output field mapping should be.

  • Line 5: It uses a field tag of - on Age so that it will not be marshaled.

  • Line 9: It creates a Record type called rec.

  • Line 16: It marshals rec to JSON.

  • Line 20: It prints the JSON.

Notice that the Name field was translated to user_name and User to user. The ID field was unchanged in the output because we did not use a field tag. Age was not output because we used a field tag of -.

Fields that are private because they start with a lowercase letter cannot be exported. This is because the JSON marshaler is in a different package and cannot see the private type in this package.

We can read about the field tags that JSON supports in the encoding/json GoDoc, located under Marshal(). The JSON package also includes MarshalIndent(), which can be used to output more readable JSON with line separators between the fields and indentions.

Decoding data into a struct type, such as Record earlier, can be done as follows:

Decoding data into a struct

This transforms text that represents the JSON into a Record type stored in the rec variable.

Marshaling and unmarshaling large messages#

Sometimes, we might receive a stream of JSON messages or a file that contains a list of JSON messages. Go provides json.Decoder to handle a series of messages. Here is an example borrowed from the GoDoc, where each message is separated by a carriage return:

Handling a stream of messages with Decoder

This example does the following:

  • Lines 16–18: It defines a Message struct.

  • Line 21: It wraps the jsonStream raw output in an io.Reader via strings.NewReader().

  • Line 26–39: It starts a goroutine that decodes the messages as they are read and puts them on a channel.

  • Line 40–42: It reads all messages that are sent until the output channel is closed.

  • Line 43–45: It prints out any errors that are encountered.

Sometimes, this format of streaming will have brackets, [], around the messages and use commas as separators between the entries.

In this case, we can utilize another feature of the decoder, dec.Token(), to remove them safely:

dec.Token usage

This code works in the same way, except it removes the outer brackets and requires a comma-delimited list instead.

Encoding data in a stream is very similar to decoding. We can write JSON messages into io.Writer to output to a stream. Here's an example:

Encoding JSON data in a stream

This code does the following:

  • Lines 15–19: It reads from a channel of Message and writes to an io.Writer.

  • Lines 21–26: It returns a channel that signals when the encoder is done processing.

  • Line 27: If an error is returned, it means that the encoder had a problem.

This outputs the JSON as separated values without brackets.

JSON final thoughts#

The encoding/json package has support for other methods of decoding that are not covered here. We can mix map[string]interace{} into our struct types and vice versa, or we can decode each field and value individually.

However, the best use cases are those that are straightforward struct types as a single value or stream of values. This is why encoding/json is our first choice when encoding or decoding JSON values. It is not the fastest method, but it is the most flexible. There are other third-party libraries that can increase your throughput while sacrificing some flexibility.

Using excelize When Dealing With Excel

Challenge: Process JSON Data